SNOW-1897441: Fix missing row position sort in DataFrame.describe#2950
Merged
sfc-gh-joshi merged 2 commits intomainfrom Jan 28, 2025
Merged
SNOW-1897441: Fix missing row position sort in DataFrame.describe#2950sfc-gh-joshi merged 2 commits intomainfrom
sfc-gh-joshi merged 2 commits intomainfrom
Conversation
sfc-gh-lmukhopadhyay
approved these changes
Jan 28, 2025
Contributor
sfc-gh-lmukhopadhyay
left a comment
There was a problem hiding this comment.
LGTM, thanks Jonathan!
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
sfc-gh-jjiao
approved these changes
Jan 28, 2025
sfc-gh-helmeleegy
approved these changes
Jan 28, 2025
Contributor
sfc-gh-helmeleegy
left a comment
There was a problem hiding this comment.
Looks good, thanks Johnathan.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1897441
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
When calling
describeon a DataFrame with object columns, pandas will report atopcolumn identifying the value that appears most frequently. If the top two values share the same frequency, pandas documentation indicates that it actually does not provide any stability guarantees:Tests involving this behavior are currently failing on QA6, where it appears that the order of results returned by a GROUP BY/COUNT query has changed. This PR adds an additional sort on the row position column to ensure that the object value that appears first is always chosen first; this may not always agree with pandas (though pandas does this in all of our current tests), but at least keeps results the same between prod and qa6.
I ran
tests/integ/modin/frame/test_describe.pywith a qa6 account to verify everything passes.